Introduction

Refactoring in software development is a very non-transparent thing for non-technical people. Especially big refactorings leave stakeholders like product owners or managers in uncertainty about the process of (well-needed) technical improvements.

In this blog post, I want to sketch an idea how we as developer could possibly communicate the work "under the hood" by using visualization.

We use jQAssistant/Neo4j and Pandas/pygal to report the status of technical improvements.

The idea

Thant to tools like jQAssistant, we have access to the underlying structures of our software. Almost every entity our software system consists of is connected with further entities via a relationship:

  • A Java class depends on a Java type
  • A class may declare fields that are written by a method
  • An interfaces might have been changed by a Git commit

and so on. All this data is scanned by jQAssistant and stored into the Neo4j graph database.

We can also associate problems (or "refactoring opportunities") with the entities of our software system. E. g. we can connect finding from FindBugs to classes, detect oven changing but buggy code or even create our own code smell detection algorithms for spotting ugly race conditions in our software.

Case Study

In our scenario, we want to communicate the effort necessary to change the access to the database. We've decided that it is no longer a good way to communicate directly over a JDBC mechanism because there are already good alternatives out there like the Java Persistence API.


In [68]:
import py2neo
import pandas as pd

graph = py2neo.Graph()

query = """
MATCH (c:Class)-[:DECLARES]->(m:Method)
WHERE 
  c.fqn STARTS WITH "org.springframework.samples.petclinic"
OPTIONAL MATCH 
  (c)-[r:DEPENDS_ON]->
    (dbType:Type)-[:EXTENDS]->
      (base:Class {name: "BaseEntity"})
RETURN 
  c.fqn as fqn,
  c.fqn as class,
  COUNT(DISTINCT dbType.fqn) as count,
  MAX(m.lastLineNumber) as lines
ORDER BY fqn
"""

entity_dependencies = pd.DataFrame(graph.data(query))
entity_dependencies.head()


Out[68]:
class count fqn lines
0 org.springframework.samples.petclinic.Petclini... 0 org.springframework.samples.petclinic.Petclini... 110
1 org.springframework.samples.petclinic.model.Ba... 0 org.springframework.samples.petclinic.model.Ba... 44
2 org.springframework.samples.petclinic.model.Na... 0 org.springframework.samples.petclinic.model.Na... 45
3 org.springframework.samples.petclinic.model.Owner 1 org.springframework.samples.petclinic.model.Owner 151
4 org.springframework.samples.petclinic.model.Pe... 0 org.springframework.samples.petclinic.model.Pe... 53

In [69]:
TEST_COLOR_CODE = 'rgba(0,0,255,{})'
CODE_COLOR_CODE = 'rgba(255,0,0,{})'
OTHER_COLOR_CODE = 'lightgrey'

plot_data = entity_dependencies.copy()
plot_data['ratio'] = plot_data['count'] / plot_data['count'].max()
plot_data.loc[plot_data['class'].str.endswith("Test"), 'color_code'] = \
    TEST_COLOR_CODE.format(plot_data['ratio'])
plot_data.loc[plot_data['class'].str.endswith("Test"), 'color_code'] = \
    CODE_COLOR_CODE.format(plot_data['ratio'])
plot_data['color'] = plot_data['color'].fillna(OTHER_COLOR_CODE)
plot_data.head()


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
C:\dev\apps\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2524             try:
-> 2525                 return self._engine.get_loc(key)
   2526             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'color'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-69-9c22dc4f44f3> in <module>()
      7 plot_data.loc[plot_data['class'].str.endswith("Test"), 'color_code'] =     TEST_COLOR_CODE.format(plot_data['ratio'])
      8 plot_data.loc[plot_data['class'].str.endswith("Test"), 'color_code'] =     CODE_COLOR_CODE.format(plot_data['ratio'])
----> 9 plot_data['color'] = plot_data['color'].fillna(OTHER_COLOR_CODE)
     10 plot_data.head()

C:\dev\apps\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2137             return self._getitem_multilevel(key)
   2138         else:
-> 2139             return self._getitem_column(key)
   2140 
   2141     def _getitem_column(self, key):

C:\dev\apps\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   2144         # get column
   2145         if self.columns.is_unique:
-> 2146             return self._get_item_cache(key)
   2147 
   2148         # duplicate columns & possible reduce dimensionality

C:\dev\apps\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   1840         res = cache.get(item)
   1841         if res is None:
-> 1842             values = self._data.get(item)
   1843             res = self._box_item_values(item, values)
   1844             cache[item] = res

C:\dev\apps\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   3841 
   3842             if not isna(item):
-> 3843                 loc = self.items.get_loc(item)
   3844             else:
   3845                 indexer = np.arange(len(self.items))[isna(self.items)]

C:\dev\apps\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2525                 return self._engine.get_loc(key)
   2526             except KeyError:
-> 2527                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2528 
   2529         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'color'

In [74]:
import pygal

treemap = pygal.Treemap(
    pygal.Config(show_legend=False))
max_count = entity_dependencies['count'].max()

for row in entity_dependencies.iterrows():
    entry = row[1]
    ratio = entry['count'] / max_count
    data = {}
    data['value'] = entry['lines']
    data['color'] = 'lightgrey' \
        if ratio == 0 else 'rgba(255,0,0,{})'.format(str(ratio))
    data['label'] = entry['class']
    treemap.add(entry['class'], [data])
    
treemap.render_in_browser()


file://C:/Users/Markus/AppData/Local/Temp/tmpkfpqpt0s.html

Dann noch commits zu behebungsticket reinnehmen